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Abstract: One of the most common diseases around the world is diabetes. Intrusive 
methods involving blood samples via a finger prick are required to test for diabetes. These 
treatments are uncomfortable and prone to infection. Non-invasive testing is proposed as a 
solution to this concerning problem. To test the glucose levels of subjects, a shortwave 


near-infrared-based optical detection system with a 950 nm wavelength sensor in reflective 
Keywords: 


Glucose, machine 


mode is presented. The system collects the measured signal through voltage, transmittance, 
absorbance and reflections to estimate glucose. The relation between voltage and predicted 


learning, near-infrared glucose is evaluated from the absorbance, reflectance, and voltage for 575 samples. A 


(NIR), noninvasive, Multiple linear regression (MLR) expression is used in the proposed method to enhance the 


transmittance, accuracy. The proposed method achieves a coefficient of determination (R2) of 99% and a 


reflectance mean absolute derivative of 3.6 mg/dl in real-time data analysis with the sensor. The root 


mean square error (RMSE) is also calculated as 3.46 mg/dl. Three additional machine 
learning classifiers are employed to achieve high accuracy in multi-class classification. 
Adaboosting and Gaussian Naive Bayes classifiers achieve an accuracy of 97% each. 
Furthermore, the system computes performance metrics such as precision, recall, and F1- 
score, and predicts the class on the test sample. 


Introduction 

According to statistics from the World Health 
Organization, the number of diabetic patients has doubled 
since 2015. In 2019, it was estimated that 9.3% (463 
million people) had diabetes, and its prevalence is 
projected to increase to 10.2% (578 million) and 10.9% 
(700 million) by 2030 and 2045, respectively (Sun et al., 
2022). In recent years, some glucose monitoring methods 
have been developed. It can be categorized into three 
main categories: invasive, minimally invasive, and 
noninvasive. The most used method of checking blood 
glucose levels is to prick the finger with a conventional 
blood glucose meter via an invasive method (Gusev et al., 
2020). However, no matter how tiny or thin the needle is, 
it causes the patient pain, making the procedure difficult 
to incorporate into their daily lives. Additionally, invasive 
glucometers are not cost-effective (Yeaw et al., 2012) 
because they come with single-use strips that must be 


replaced once used. Alternatively, minimally invasive 
techniques causing little skin damage may be used (Chen 
et al., 2017). This method requires calibration more 
frequently than traditional measurement methods. These 
devices are expensive and have stability and lifespan 
problems (Smith et al., 2015). Therefore, these devices 
are unsuitable for regularly monitoring blood glucose 
levels. Due to these reasons, different researchers have 
developed painless, accurate, and _ cost-effective 
noninvasive methods of measuring blood glucose (Van 
Enter and Von Hauff, 2018). In this way, regular blood 
glucose monitoring could become a more relaxed and 
comfortable experience for millions of people. 

Some approaches have been proposed for non- 
invasive blood glucose detection, including in-vitro and 
in-vivo techniques (Jain et al., 2019). An in vitro 
approach involves studies or tests conducted outside a 
living organism, such as in a laboratory. In the in-vivo 
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method, the test is conducted on a living organism, which 
is more suitable for self-monitoring blood glucose levels. 
In view of the complexity of blood and tissue properties, 
optical technologies are particularly well suited to the 
detection of glucose in vivo. Another advantage of optical 
technologies is that they are less likely to irritate the 
targeted biological tissue. Various optical technologies, 
including visible laser light, Raman spectroscopy, mid- 
infrared (MIR), near-infrared (NIR), etc., have been used 
to measure from the user's perspective. The other two 
optical techniques, MIR and NIR, have received more 
research attention in recent years. NIR signal has 
wavelengths between 750 nm and 2500 nm, while a MIR 
signal has wavelengths between 2500 nm and 10000 nm. 
MIR penetrates only a few micrometers into human 
tissue, so it can only be used in the reflection mode 
(Heise, 2021). Therefore, NIR spectrometry is a suitable 
method for estimating blood glucose levels. In contrast to 
MIR, NIR light can penetrate through multiple layers of 
the skin and reach the subcutaneous vessels regardless of 
the pigmentation of the skin. Among these techniques, 
NIR spectroscopy has proven to be a useful method for 
determining glucose levels precisely (Goodarzi et al., 
2015). The NIR spectrum is further divided into two 
methods: the long NIR spectrum and the short NIR 
spectrum. Compared to the long NIR, the short NIR has a 
deeper penetration capability beneath the skin, allowing it 
to detect glucose molecules more accurately. Thus, the 
proposed work focuses on the short-wave NIR reflectance 
spectroscopy technique at 950 nm with improved 
accuracy. The following sections discuss prior research 
and the novelty of the proposed approach. 

In other words, NIR has a deeper penetration into the 
skin than most other infrared wavelengths. The NIR 


spectrum analysis can be categorized into two 
subcategories, which include the analysis of NIR 
spectrometry and the analysis of NIR 


Photoplethysmography (PPG) signals. As for NIR PPG 
signals must be acquired with NIR LEDs, whereas a NIR 


spectrometry signal must be analyzed by measuring 
voltage after absorption and reflection. A review is being 
conducted to summarize these two categories, 
emphasizing the machine learning analysis necessary to 
estimate glucose using NIR PPG signals (Hina and 
Saadeh, 2022). On the other hand, numerous studies in 
the literature have demonstrated that NIR bandwidths and 
characteristic spectra vary with blood glucose levels 
(Jintao et al., 2017; Yang et al., 2018; Lee et al., 2019). 
Further, NIR spectrometry can be divided into two 
regions based on their bandwidths: long-wave NIR and 
Short wave NIR. The NIR waves are partially scattered or 
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absorbed during penetration through the skin tissue. The 
scattering and absorption of molecules in a medium occur 
due to the vibrations of their chemical bonds. This 
phenomenon makes it possible to determine the glucose 
concentration bonds that contain the chemical formula 
CeH,.05 (Pigman, 2012). The functional bonds of the 
glucose molecule, which consist of C-H and C-O, can be 
used to measure the absorption and reflectance of NIR 
waves to determine glucose concentration in the blood. 
During light absorption by biological tissue, glucose 
molecules are easier to detect using long-wave NIR. 
However, due to its shallow penetration, long-wave NIR 
will not provide better results for in-vivo tests (Uwadaira 
et al., 2016). On the other hand, short NIR waves are 
weakly absorbed by glucose molecules, but they can be 
used for in vivo testing due to their sharp penetration. A 
study in (Jain et al., 2019), used both shorter regions to 
estimate glucose levels. A total of three sensors have 
been used to operate at 940 nm and 1300 nm, out of 
which two 940 nm sensors operate in absorbance and 
reflectance modes, and one 1300 nm sensor is used in 
absorbance mode. According to these results, short NIR 
regions are more focused and studied to estimate blood 
glucose levels. 

The NIR absorption peaks for glucose isomers such as 
fructose, lactose, and galactose are not coincident with 
glucose absorption at approximately 950 nm (Simeone et 
al., 2017). Hence, these isomers do not adversely affect 
the detection of glucose. Also, a significant glucose 
absorption spike can be seen in the NIR between 930 nm 
and 960 nm (Yadav et al., 2015). They can be used as 
reflection and transmission modes depending on the 
specimen type and human body part selection (Villena et 
al., 2019). In other words, the reflection configuration is 
preferred for thick and dense samples, while the 
transmission configuration is more effective for thin and 
aqueous samples. Moreover, the reflective configuration 
has an advantage from a wider selection of human body 
parts compared to the transmission configuration. In a 
previous study, NIR photodiodes with wavelengths of 
940 nm and 950 nm were used to measure blood glucose 
concentration levels non-invasively (Abidin et al., 2013). 
According to this study, 950 nm was the preferred 
wavelength of light for passing through blood glucose 
concentrations more effectively than 940 nm. Another 
method utilizes a 950 nm reflective sensor and a signal 
conditioning component with a 96% accuracy. This 
method requires 9 volts of power to measure glucose 
(Anupongongarch et al., 2019). 

On the other hand, multiple linear regression (MLR) is 
a Statistical technique used to analyze the relationship 


between multiple independent variables and a dependent 
al., 2021). When the 
dependent variable and 


variable 
relationship between 


(Montgomery et 
the 
independent variables is not linear but exhibits a 
nonlinear pattern, such as exponential growth or decay, 
the logarithmic form is often considered to transform the 
relationship into a linear form. Therefore, this approach 
could be combined with the short-wave NIR technique to 
enhance glucose monitoring accuracy further. Moreover, 
recently, machine learning (ML) has the potential to 
revolutionize healthcare by improving disease diagnosis 
(Chandrasekhar and Peddakrishna, 2023), the machine 
learning classification of diabetes (Teki et al., 2021) 
using binary classification performed for PID dataset and 
personalized treatment (Rajkomar et al., 2019). Various 
ML techniques have been proposed in six machine 
learning classifications performed for binary class 
(Miriyala et al., 2022) to predict glucose. To extract 
relevant features from measured or predicted data. A real- 
time emotion identification system using ECG and 
temperature sensors with machine learning. Random 
forest (RF) has been employed to analyze continuous 
glucose monitoring data to predict the occurrence of 
hypoglycemic events in type 1 diabetes patients (Haak et 
al., 2017). Moreover, KNN algorithms have been used to 
classify glucose data based on their similarity to 
previously observed patterns (Ali et al., 2020). Thus, 
using machine learning techniques in glucose prediction 
and classification has yielded encouraging outcomes and 
this approach holds significant potential for enhancing 
diabetes management. 

This study is to develop a glucose prediction and 
classification approach by utilizing a combination of ML, 
shortwave NIR techniques, and MLR. The data collected 
from the sensor is utilized in MLR, which accurately 
predicts glucose levels. It explores the correlation 
between glucose concentration and signal 
absorbance/transmittance, using MLR to achieve high 
accuracy. Furthermore, ML algorithms are utilized to 
categorize glucose levels into multi classes, such as 
normal, hyperglycemic, and hypoglycemic, using the 
spectral data obtained from the non-invasive short-wave 
NIR technique, which measures glucose levels. This 
combined approach allows for the development of a 
reliable and accurate glucose monitoring system that can 
be employed for diabetes management. In order to 
improve the accuracy of the previous work, a continuous 
glucose monitoring system using NIR spectroscopy is 
presented. This system employs a 950 nm reflective 
sensor that is capable of measuring an accuracy of 99%. 
This accuracy is achieved by using an MLR. To 
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determine the accuracy of the proposed method, 184 
subject samples are considered. Additionally, the present 
work discussed the relationship between glucose 
concentration and signal absorbance and transmittance. 
The following section discusses the proposed method and 
its implementation. 


Design and Implementation 

The reflective glucose sensor at a wavelength of 950 
nm is used to examine the variations in the sample's 
optical properties. Reflective sensor mode measures the 
quantity of light reflected from a finger with the aid of 
TCR1000. However, these methods may produce varying 
baseline values due to differences in the optical properties 
path. Therefore, it is critical to determine the appropriate 
baseline values for the sensor and calibrate it accordingly 
to guarantee accurate and precise measurements. To 
implement this system, an optical sensor TCRT1000, a 
precise ADC converter, and a _ microcontroller 
computation unit are needed, as shown in Fig. 1. When an 
IR-emitting LED comes into contact with a finger from 
TCRT1000. The reflected light can be used to detect 
glucose-induced energy absorption in TCRT1000. A 
current-limiting resistor (Ry = 340 Q) must be added in 
series to protect the IR LED. The circuit design must 
include a resistor (R°=47 kQ) in series with the collector 
for the light receiver. This will limit the current going 
through the phototransistor to prevent its destruction. The 
signal produced from the sensors is fed to an ADS1115 
connected to single-ended inputs to AO of the sensor 
output reflective sensor respectively. Here it is calibrated 
with a gain of two third and interfaced with the 
microcontroller using the I2C bus protocol. 

The signal received is converted into millivolts to 
predict blood glucose. This is computed by the 
microcontroller using an Algorithm, as in Table 1, which 
extracts the necessary data, such as voltage (xj), 
transmittance (x2), absorbance (x3), and reflectance (x,), 
to predict glucose concentration for the 950 nm sensor. 

The obtained transmittance, absorbance, reflectance, 
voltage values, and baseline calibrated value with 
reference device were analyzed. In order to calculate the 
optical density (OD) or Absorbance of the human finger 
medium, it is necessary to measure the transmittance of 
the light (T), which is the ratio of transmittance voltage 
(Vt) from the medium to the incident voltage (Vi) from 
the LED source. This can be represented by Equation (1). 


The optical density (OD), also known as absorbance of 
human finger medium, can be represented in Equation 


(2). 
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OD = -—log,,(T) The relationship between voltage, transmittance, 
absorbance, and reflections of the infrared signals or loss 
from the finger and sensors are evaluated. Figure 2 and 
Figure 3 show the relationship between measured voltage 


Total infrared signal emitted by the sensor is equal to 
the sum of absorbance, transmittance and reflections is 


are ene ae + Reflections=1......(3) and predicted glucose from the reflective sensor with a 
Table 1. Algorithm for computation of features 
Algorithm 
Input: adcO 


Output: X1, X2, X3 X4- Input features for Prediction of glucose 


Step 1: Read the sensor value from finger to adcO ADSI115 
Step 2: initialize mv €0, R €4.08(offset value) 
Step 3: convert sensor value into millivolts. 
mv € (adc0*0.1875)/100 
Step 4: Get 10 sample values from the sensor to smooth the value. 
ali] <—mv 
Step 5: sort the data from small to large. 
b €a[i] 
ali] €clj] 
cli] €b 
Step 6: take the average value of 6 center samples. 
d+ €a[i] 
e€d/6 
Step 7: To get the voltage value in V from the averaged samples. 
x1 €e 
Step 8: Calculate the Transmittance of the signal. 
X2 €x)/R 
Step 9: Calculate the Absorbance of the signal. 
x3 © -log(x2) 
Step 10: Calculate the Reflectance of the signal. 
X4 €1 -X2-X3 


x D 
E TCRT1000 


Microcontroller 


\ Sensor Unit / \ ADC Unit ) \ Computational Unit / 


Figure 1. Block diagram of the proposed method 
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Figure 2. Illustrate the relation between voltage and measured features 


950 nm wavelength. In Figure 2, light transmittance 
increases proportionally as the voltage increases, 
depending on the glucose concentration in the subject's 
finger. As the voltage decreases, absorbance increases, 
depending on the glucose concentration present in the 
subject's finger, respectively. The and 
maximum voltage values obtained were 1.79V and 
3.78V, respectively. The minimum and maximum 
absorbance values were 0.443 and 0.120, respectively. 
Similarly, the minimum and maximum transmittance 
values were 0.359 and 0.757, respectively, while the 
minimum and maximum reflection values were 0.196 and 
0.126, respectively. Similarly, From Figure 3, the 
predicted glucose ranges minimum and maximum 
concerning their absorbance, transmittance, and 
reflections of the sensor. Here, as the predicted glucose 
increases, the absorbance of the signal also rises, whereas 
the transmittance of the signal decreases with decaying 
behavior. Reflection of the signal shows lossy behavior to 
the transmitted signal. 
Glucose Prediction Using MLR Method 

The multiple linear regression (MLR) method is 
utilized in this study to predict the glucose concentration 
value by creating a linear combination of input variables 
such as the measured voltages, transmittance, reflectance, 
and absorbance for 950 nm (represented by x), X2, x3 and 
x4, respectively). The output value is the predicted 
glucose concentration (y) based on the reference Dr trust 
glucometer measurement. The MLR model involves 
fitting a linear equation to the data, with the NIR 
measurements at 950nm as independent variables and the 
glucose value as the dependent variable. The ordinary 
least squares method is commonly used to find optimal 
parameter values that minimize the error. The MLR 


minimum 


model is considered appropriate for accurate glucose 
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concentration prediction. To minimize the error between 
the actual glucose concentration values and the predicted 
values, the MLR model undergoes iterative optimization, 
searching for the optimal coefficient values. The ordinary 
least squares (OLS) method is frequently used to 
determine the optimal parameter values. The iterative 
optimization process entails performing the MLR 
modeling process multiple times until the optimal 
coefficient values are achieved. In this study, MLR is 
applied to four independent predictors to find the 
prediction (In y) in natural logarithmic form because the 
input predictors demonstrate an exponential increase and 
decay with respect to voltage, as shown in equation (4). 


0.8 ay i 0 0.22 
e— Transmittance% 0.45 
—— Absorbance% st 
0.7 my, —¢— Reflectance%, 0.40+ 0.20 
4 03} 
0.18 
0.6 “a 
05 0.257 9.16 
y) 
— 0.14 
04 0.15 
0.12 


03 
250 300 350 400 450 


Glucose mg/dl 


Figure 3. Shows predicted glucose and measured 
features concerning the reflective sensor 


50100 


150 = 200 


In(y) = 4.105 + 6.33ln x, +3.4751In x, + 9.005 In x, —5.587Inx, 


The product and power rules with exponential were 
applied to find y and fit the model, with reference to the y 
given in equation (6). This further improved the R 
squared value, which is now 99%. 

y’ ~ (0.223 a Gs hea 

A scatter plot was created for the output versus each 
input variable, and linearity was assessed with reference 
to the device, as shown in Fig. 4. The correlation 
determination coefficient between the input variables and 
the target variable was calculated at 99% and the root 
mean square error (RMSE) is 3.4. 

450 
® Predicted mg/dl 


Predicted mg/dl 
be nd 
3 8 


z 


50 100 150 


200 250 300 350 400 450 
Reference mg/dl 


Figure 4. Depicted the reference (Dr. Trust) and 
predicated values are linear 


The technique characterizes the relationship between 
voltages from the sensor and predicts blood glucose 
concentration with respect to the reference glucose device 
(Dr. Trust's glucometer). The detector input features 
result is an independent variable related to the expected 
glucose response of the 950 nm sensor. The proposed 
model was developed with 575 samples, 289 subjects 
aged 19-69, collected randomly in blood glucose test 
mode. Precision was evaluated based on the mean 
absolute relative difference (mARD), mean absolute 
deviation (MAD), RMSE, and average error. With the 
proposed method minimizing overall error. Performance 
is evaluated using equations (7), (8), and (9). 


| BGy. ; — BGp, 
MARD = >3 Ret PE) 100 sees (7) 
nat BG. ; 
1 n 
MAD S— DBCS BC | aireniii cians (8) 
N jz1 
Ig : 
RMSE = ,[—°|BG., —BGyy_| eeceoe (9) 
nN j= 
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BGPre and BGRef represent the predicted and 
reference blood glucose’ concentration — values, 
respectively. With n = 575 samples, the MARD is 3.6%, 
MAD is 2.91 mg/dl, and RMSE is 3.46 mg/dl, indicating 
high precision. 


n 


AvgErr = zp > 
N j= 


Vegi — is 


OO ecvpscuaniees (10) 


md 

The minimum deviated value Vmd and the measured 
value Vm is used to calculate the AvgErr using equation 
(10), which is found to be 3.73 %. The coefficient of 
determination (R’) is also calculated and found to be 
0.99. To assess the clinical accuracy of the proposed 
system, a Clarke error grid (CEG) analysis is performed, 
and the glucose values are shown in Figure 5. The 
measured glucose values fall within the clinically 
accepted zone, known as Zone A. 


MLR regression Clarke Error Grid 


MARD=3.6% 


Prediction Concentration (mg/dl) 


0 100 200 30 400 500 600 


Reference Concentration (mg/dl) 


700 


Figure 5. CEG for train data with respect to reference 
and predicted glucose 


Table 2 presents a comparison of the proposed method 
with existing literature. Jain et al. (2019) use three 
different NIR sensors, while (Anupongongarch et al., 
2019) employs only one sensor. Larin et al. (2002) 
utilized OCT technology, but most parameters were not 
measured. Song et al. (2015), despite using two different 
technologies, the average error rate was still at 19%. 
Photoacoustic technology was employed with CEGs of 
93% and 100%, respectively (Pai et al., 2017a & 2017b), 
but it is known to be very expensive. Visible laser light 
technology was used (Ali et al., 2017), and the average 
reported error rate was between 8% and 10%. In light of 
these findings, the proposed approach is deemed superior 
to the above technologies with a high R’ value. 
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Table 2. Comparison of performance parameters with previous work 


Anupongon 


Proposed Jainetal., het al,, Larinet) Song et al., Pai et al., Paietal., _Aliet al., 
Method 2019 a oie” tl» 2002| 2015 2017a 2017b 2017 
R’ value 0.99 0.908 0.96 0.95 - - - - 
mARD (%) 3.60 i - - 8.30 8.84 7.01 - 
AvgErr (%) 343 S07 - - 19 - - 8-10 
MAD (mg/dl) 2.91 3.87 - - - 32.8 a2 - 
RMSE 3.46 5.61 11 - - 43.64 7.64 - 
CEG(A&B %) 100 100 - - 100 93 100 98 
Impedance|Photoacoust|Photoacou| Visible 
Technalény NIR NiR | NiR_ | ocr}? 
and NIR ic stic _|laser light 
System cost Cheaper | Cheaper | Cheaper | Costly; Cheaper Costly Costly | Cheaper 
Glucose Miulticlass Level Prediction Using Ml 
450 Methods 
400 5 An ML classifier is utilized to categorize glucose 
— Reference : ; . 2 
Predi | levels into different categories based on predicted values. 
S 350 redicted ie 
Multiclass classification data obtained from a 950 nm 
¢ 300 wavelength sensor is categorized into three classes within 
s 250 the glucose concentration range of 60 mg/dL to 430 
= 200 mg/dL. The three classes are classified as follows: 
5 glucose concentrations less than 80 mg/dL 
5 7 (hypoglycemic range) as class 0, greater than 180 mg/dL 
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Figure 6. Measured glucose for reference and 
predicted sensor for the number of samples 


Further, Figure 6 compares the measured glucose 
range of the reference and sensor results for the number 
of samples which was 575. In addition, Figure 7 
demonstrates that both methods exhibit a glucose range 
of 60mg/dl to 430 mg/dl, indicating the ability of the 


proposed device to perform accurate measurements. 
450 


400 
HH) 25°%~75% 
s 350 T Range within 1.5IQR 
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£ 
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Figure 7. Measured glucose range for reference and 
predicted sensor 
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(hyperglycemic range) as class 2, and concentrations 
between these values (normal range) as Class 1. The 
input for the machine learning classification algorithms 
comprises the data points (x1, x2, x3 and x4 ), and the 
classification models are trained to predict the class 
labels 
classification was performed using three classification 
algorithms: AdaBoosting (AB), Decision Tree (DT), and 
Gaussian Naive Bayes (GNB). To achieve the best 
performance of classification algorithms, fine-tuning the 


of new cases accurately. The multi-class 


model is crucial. The hyperparameter tuning values are 
set using the gridsearch CV before the training process 
begins. Repeated looping through predefined 
hyperparameters helps to fit the model to the training set. 
Table 3 lists the best-tuning hyperparameter values for 
the three algorithms. 
GNB classifier 

The GNB classifier is a probabilistic ML model that is 
commonly used for classification tasks. The algorithm is 
based on Bayes’ theorem and initially estimates the mean 
and variance of each feature for each class label based on 
the training data. The algorithm calculates the conditional 
probability of each class label given the observed features 
by utilizing the previously estimated mean and variance 
values and applying Bayes’ theorem when presented with 
new data. In this study, a 10-fold approach was used to 
evaluate the algorithm’s performance. The obtained 


results are presented in Fig. 8. The model’s accuracy was 
evaluated for each fold resulting in the following 
accuracies: (1, 1, 0.977, 1, 0.977, 1, 1, 1, 0.977, 1), with a 
mean accuracy of 0.993. The accuracy loss for each fold 
was (0, 0, 0.02325581, 0, 0.02325581, 0, 0, 0, 
0.02325581, 0). The overall accuracy of the Gaussian 
Naive Bayes test was 96.53%. During testing, one of the 
data points was predicted to belong to class 2, which 
matched the original class. 
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performance on the training data. One of the advantages 
of AB is that it is less prone to overfitting than a single, 
more complex classifier. The accuracy, along with the 
accuracy loss for 10-fold with good cross-validation 
model performance, is presented in Figure 9. The 
accuracy of the trained model was evaluated for each 
fold, resulting in accuracies of (0.955, 1, 1, 1, 0.977, 
0.953, 1, 0.977, 0.977, 1), with a mean accuracy of 0.984. 
The accuracy loss for each fold was (0.04545455, 0, 0, 0, 


Table 3. Hypermeter best-tuning parameters values. 


Classifier 


Gridsearch CV hypermeter tuning values 


GNB 


var_smoothing=0.0004328761281083057 


AB 


Learning_rate:0.1, n_estimators:10 


verbose=1) 


max_leaf_nodes': list(range(2, 100)), 'min_samples_split': [2, 3, 4], Cv=10, 


Accuracy and Accuracy Loss Curve(GNB) 
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Figure 8. Accuracy, loss for each fold, and model GNB 
classifier 


Accuracy and Accuracy Loss Curve(AB) 


Accuracy Loss 
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Figure 9. Accuracy and loss for each fold for AB 
classifier 


AB classifier 

The AB algorithm is a widely used ensemble learning 
approach for classification tasks. Its working principle 
involves iteratively training a series of weak classifiers. 
The one that performs the best on the weighted data is 
selected and added to the ensemble. The final classifier is 
a weighted combination of the weak classifiers, where the 
weight of each weak classifier is proportional to its 
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0.02325581, 0.04651163, 0, 0.02325581, 0.02325581, 0). 
The overall test accuracy for Ada Boosting was 97.22%. 
During testing, one of the data points was predicted to 
belong to class 0, which matched the original class. 
DT classifier 

Thirdly, the DT classifier algorithm is used for 
classification tasks. It creates a model that predicts the 
target variable's value by learning simple decision rules 
inferred from the data features. The model takes the form 
internal node 
represents a test on an attribute, each branch represents 
the outcome of the test, and each leaf node represents a 
class label. To make predictions on new data, the 
algorithm traverses the tree from the root node to a leaf 
node that corresponds to a class label, and the prediction 
is based on the majority class of the training that reaches 
that loss, and model 
performance for 10-fold cross-validation are calculated 
and presented in Figure 10. The accuracy of the trained 
model was evaluated for each fold, resulting in accuracies 
of (0.977, 0.953, 0.93, 0.93, 0.837, 0.93, 1, 0.814, 0.977, 
0.953), with a mean accuracy of 0.93. The accuracy loss 
for each fold was (0.13636364, 0.13953488, 0.02325581, 
0.09302326, 0.09302326, 0.11627907, 0, 0.09302326, 
0.02325581, 0.02325581). The overall test accuracy for 
DT was 95.14%. During testing, one of the data points 
was predicted to belong to class 2, which matched the 
original class. 


of a tree-like structure where each 


leaf node. The accuracy, 


To compare the performance of three machine 
learning classifier models for the multiclass problem with 
the three classes (0, 1, and 2) of predicting the glucose 
range in diabetic patients, the performance of the three 
models was evaluated by measuring precision, recall, and 
Fl-score. The results of these evaluations are presented in 
Figures 11 to 13. 
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Accuracy and Accuracy Loss Curve(DT) 


Accuracy Loss 


2 4 6 8 10 
Fold 


Figure 10. Accuracy and loss for each fold for the DT 
classifier 
After evaluating the performance of the three models, 


it was found that the AB model achieved an overall 
accuracy of 97%, the GNB model achieved an overall 
accuracy of 96.53%, and the DT model achieved an 
overall accuracy of 95.14%. These accuracy values 
indicate that all three models could accurately predict the 
glucose range for diabetes patients across the three 
classes, with the AB model having the highest overall 
accuracy. These results demonstrate the potential of ML 
models in accurately predicting the glucose range for 
diabetes patients, which can aid in disease management 
and treatment. 
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Figure 11. Comparison of three ML classifiers for 
Class 0 


Class-1 (80>glucose mg/dl<180) 
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Figure 12. Comparison of three ML classifiers for 
Class 1 
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Class-2 (Glucose > 180) 
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Figure 13. Comparison of three ML classifiers for 
Class 2 


Discussion 

The proposed approach for detecting blood glucose 
levels utilizes wavelength near-infrared (NIR) technology 
at 950 nm. This approach outperforms other technologies 
with high accuracy (R’). This study measured the blood 
glucose history of 282 T2D and 7 T1D patients under 
medical supervision at the VIT-AP University health 
center. All participants provided informed consent under 
the Helsinki guidelines. Blood glucose levels were 
measured for 5 minutes using a proposed sensor glucose 
monitoring system and a reference device, the Dr. Trust 
fingerpick device. Data from 289 subjects, including 
males and females aged 19-69 years with hypo, normal, 
and hyperglycemia, were analyzed, resulting in a total of 
575 blood glucose level samples obtained through a 
glucometer (ranging from 62 to 400 mg/dL). The study 
identified hypoglycemic (BG level <80 mg/dL), normal 
(79>BG level <182 mg/dL), and hyperglycemic (BG 
level >180 mg/dL) levels. The reference device, Dr. 
Trust, uses glucose dehydrogenase (GDH) flavin adenine 
dinucleotide (FAD) enzyme (FAD-GAD) with a 
measuring range of 30-600 mg/dL and requires 0.5 ul 
blood. This device was validated using HbAIC lab tests, 
with the results showing 99% accuracy compared to the 
reference device. The lab test HbAIC values were 
converted from mmol/L to mg/dL using the validation of 
the reference device (Dr. Trust). 


Conclusions 

This proposed system with a 950 nm wavelength was 
used in this study to determine subjects’ glucose levels 
without invasive methods. Measured non-invasive 
glucose values are compared with invasive glucose 
measurements from a gold standard Dr. Trust glucose 
meter. A total of 575 real-time samples are collected 
from 289 subjects’ random glucose measurements. 


Regression expression is utilized in the suggested strategy 


to increase accuracy based on real-time data analysis. In 
real-time data analysis with the proposed method for the 
sensor, the R* and MAD increase to 0.99 and 3.6 mg/dl, 
respectively. Additionally, it is obtained the RMSE is 
3.46 mg/dl. The three ML classification methods were 
used to predict multiclass, the 2-classifiers given 97% and 
the 1- classifier given 95%. Based on these parameters, 
the proposed method appears to be more efficient than the 
existing literature. In the present work, the limitation is in 
the form of a system that can be further enhanced as a 
portable device. From the statistical point of view, more 
subjects should be tested on Type-I diabetes to analyze 
time series responses in future work. 
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